rticle 


Article 


Protein Science 
DOI 10.1002/pro.3208 


SARS-Unique Fold in the Rousettus Bat Coronavirus HKU9 

Robert G. Hammond, Xuan Tan, and Margaret A. Johnson* 

Department of Chemistry, University of Alabama at Birmingham 

Running Head: BatCoV SARS-Unique Fold 

Address correspondence to Margaret A. Johnson, Department of Chemistry, CHEM 274, 
University of Alabama at Birmingham. 1720 2nd Ave. South, Birmingham, AL 35294 
Phone: 205-934-8137; Fax: 205-934-2543; Email: maggiejohnson@uab.edu. 

Manuscript Pages: 36 
Supplementary Material Pages: 3 
Supplementary Material Tables: 2 
Supplementary Material Figures: 1 
Supplementary Material Description: 

NOEs used for the structure determination (shown in figure); - filename FIGSl.tif. 

Table of oligonucleotides; - filename SupplementaryTable-.docx. 

Table of validation tests and results: filename TableS2.docx. 

Figure legend (in manuscript file). 

This article has been accepted for publication and undergone full peer review but has not been 
through the copyediting, typesetting, pagination and proofreading process which may lead to 
differences between this version and the Version of Record. Please cite this article as 
doi: 10.1002/pro.3208 
© 2017 The Protein Society 

Received: Mar 01, 2017; Revised: May 26, 2017; Accepted: May 26, 2017 

This article is protected by copyright. All rights reserved. 







Protein Science 


Page 2 of 48 


Abstract 

The coronavirus nonstructural protein 3 (nsp3) is a multifunctional, multidomain protein 
that comprises multiple structural domains. This protein assists viral polyprotein 
cleavage, host immune interference, and may play other roles in genome replication or 
transcription. Here we report the solution NMR structure of a protein from the “SARS- 
unique region” of the bat coronavirus HKU9. The protein contains a frataxin fold or 
double-wing motif, which is an a + P fold that is associated with protein/protein 
interactions, DNA binding, and metal ion binding. High structural similarity to the 
human severe acute respiratory syndrome (SARS) coronavirus nsp3 is present. A 
possible functional site that is conserved among some betacoronaviruses has been 
identified using bioinformatics and biochemical analyses. This structure provides strong 
experimental support for the recent proposal advanced by us and others that the “SARS- 
unique” region is not unique to the human SARS virus, but is conserved among several 
different phylogenetic groups of coronaviruses and provides essential functions. 

Keywords: SARS-unique domain, frataxin, double-wing motif, NMR, coronavirus, 
protein functional annotation, viral protein, nonstructural protein 
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Significance: 

The three-dimensional structure of a protein in the SARS-unique region of the bat 
coronavirus HKU9 (Hong Kong University 9) was solved by NMR. The structure is 
highly similar to that of the human severe acute respiratory syndrome (SARS) 
coronavirus. This may indicate conserved functions among animal and human viruses. 
The fold reveals a potential functional site. This represents the first structure of a domain 
from a bat coronavirus HKU9 nonstructural protein. 
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Introduction 



Coronaviruses are single-stranded, positive-sense, enveloped RNA viruses that infect 
both humans and animals. Coronavirus infections have a range of severity and include 
upper and lower respiratory symptoms, with a low frequency of acute lung injury and 
acute respiratory distress syndrome. 1 Acute gastrointestinal, hepatic, and neurological 
symptoms have also been observed. Since 2002, the human coronaviruses (CoVs) have 
emerged as significant public health threats. The severe acute respiratory syndrome 
(SARS) virus is the etiological agent of the 2003-2005 pandemic that affected more than 
30 countries. In 2012, the Middle East respiratory syndrome (MERS) virus emerged in 
the Middle East, followed by the spread of the virus to other countries (e.g. the UK, 
South Korea). As of 2016, there had been 1728 confirmed cases of MERS affecting 
persons in 27 countries. 4 Prior to these outbreaks, CoVs were kn own to be responsible 
for mild upper and lower respiratory infections. For example, human CoV 229E and 
OC43 cause a minority of respiratory tract infections.” Based on phylogenetic and 
serological analyses, the International Committee for Taxonomy of Viruses has placed 
Jthe CoVs in four genera, namely the Alphacoronaviruses, 
Betacoronaviruses, Gammacoronaviruses and Deltacoronaviruses . 5 Under this 
classification, the betacoronavirus genus has been divided into groups a to d, whereby the 
SARS-like CoVs are found in group B and MERS-like CoVs in group C. The group D so 
far has been detected only in bats. 6 

Bats are reservoir hosts of multiple zoonotic viruses, including CoVs. Surveillance 
studies and phylogenetic analyses have shown that high genetic diversity exists among 
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the SARS-like viruses present in bats, allowing for the possibility of recombination and 
the evolution of new variants. 7 A bat virus with 96% nucleotide sequence identity to the 
human SARS-CoV was shown to be capable of using the human ACE2 enzyme as a 


bat SL-CoV-WIVl could grow on human epithelial cells and Vero E6 cells, and was 
neutralized by human SARS convalescent sera. This virus is a possible direct progenitor 
of the human SARS-CoV. 8 ’ 9 

Several group c betacoronaviruses, such as the HKU4, HKU5, and PREDICT/PDF-2180, 
have been identified in bats from distinct locations around the world. Some genome 
regions in these bat viruses are highly conserved with respect to the human MERS virus; 
for example, PREDICT/PDF-2180 shares 97% sequence identity with the MERS virus in 
ORF1B. 10 It is hypothesized that RNA recombination either in the bat or in an 
animal host gave rise to the MERS-CoV. 10 The HKU4 virus, which is 
derived from the lesser bamboo bat ( Tylonycteris pachypus), shares 92.4% RNA 
polymerase, 67.4% spike protein, and 72.3% nucleocapsid amino acid identity with the 
MERS CoV and is able to use the same receptor for attachment and entry (the cell surface 
protein DPP4). ’ The group D betacoronavirus Hong Kong University 9 (HKU9) is 
also widely distributed, and has been detected in diverse species including Rousettus 
leschenaulti, Hipposidereos commersoni, Eidolon helvum, and Rousettus aegyptiacus 
from Asia to Africa. 13 ' 16 


intermediate 



o 

receptor. This demonstrates the same mode of cell entry as the human SARS-CoV. The 


Whether bat CoVs undergo adaptation to intermediate hosts, or are transmitted 


directly to humans, it is clear that they pose a threat to human health. Hence, it is 




imperative to understand bat CoV biochemical and biological functions. At present, only 
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one high-resolution structure of a BatCoV HKU9 protein domain is known, the spike 

17 . 

rotein external receptor-binding domain (RBD). This structure revealed critical new 
information such as the external subdomain adopting a helical fold versus the beta-sheet 
topology observed in other betaCoV receptor domains. As a result, the HKU9 RBD does 
not bind to the other betaCoV receptors, ACE2 and CD26, underlining the importance of 
carrying out structural studies on bat proteins. Hence, we have initiated a program to 

[exp lore bat protein structure-function relationships, with the goal of determining 

r ——i 

conserved versus divergent functions. 

The CoV virion is composed of four structural proteins, which are believed to 
assist genome packaging, cell entry and virus spread." In contrast, the replicase gene 
directs the expression of two large nonstructural polyproteins, ppla and pplab, that 



become mature nonstructural proteins (nsps) after cleavage by viral proteases. These 


^ \ 

proteins assemble into a replicase-transcriptase complex (RTC) that is responsible for 
RNA genome replication, processing and transcription of sub-genomic RNAs. 
Interference with the innate immune system, and other interactions with functions of the 
host cell also localize to the nsps. Several of these functions are essential for viral 

18 25 

replication, growth and virulence. 

The nonstructural protein 3 (nsp3) is a multifunctional protein consisting of 
sixteen functional domains and 1,922 amino acid residues. ’ ' This protein is the 

largest component of the RTC. Nsp3 is one of the most divergent regions of the CoV 
genome. The domain structure of nsp3 is variable among CoVs, with one or two 
papain-like cysteine proteases, transmembrane regions, RNA-binding proteins, and one 
or more macrodomains. ’ ’ Key functions of the nsp3 include protein/protein 
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B. 

interactions involved in replicase assembly and function; polyprotein processing by the 
papain-like cysteine protease domain; and deubiquitinase activity involved in innate 
immune system interference. There are one or more macrodomains in the protein, for 
which roles in countering the host cell innate immunity have been demonstrated" ' and 
roles in viral RNA synthesis have been proposed. 40 A “SARS-unique region” with a 
three-domain structure was identified in the nsp3 of SARS. The macrodomains in the 
SARS-unique region were shown to be G-quadruplex binding proteins, and to interact 
with the RCHY ubiquitin ligase to target p53 for degradation.' 5 ’ 41 ' 42 The smaller C- 
terminal domain in this region adopts a frataxin-like fold and has been shown to bind 
purine-rich RNA sequences. In the human SARS-CoV, the functions of this region 
were essential for viral replication. 43 However, based on discoveries since 2002 and the 
emergence of other viruses, it has been hypothesized that the “SARS-unique region” is in 
fact conserved in other viruses, in particular in the group B, C, and D betacoronaviruses. 

We are investigating the “SARS-unique region” of bat CoVs. Here, we report the 
solution structure of the small C-terminal domain of this region, which we term HKU9 C. 
We describe for the first time the structural and functional analysis of a nonstructural 
protein domain from the betacoronavirus lineage D. We also discuss the conserved 
elements of the nsp3 C domain compared to other proteins in the frataxin fold family; 
including a possible functional site that is conserved relative to the human SARS-CoV. 

Results 

NMR structure determination 
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15 13 

NMR experiments were performed with uniformly N, C-labeled HKU9 C expressed 
and purified from E. coli. The construct used contains the entire predicted C domain 
(spanning the residues 573 - 646 of the nonstructural protein 3 (nsp3), with an additional 
N-terminal segment Ser-His-Met derived from fusion tag cleavage. These residues 
correspond to the residues 1345-1418 of the replicase polyprotein lab of BatCoV HKU9 
(Uniprot ID: P0C6W5). The numbering differs because the viral polyprotein is cleaved 
by the viral protease PLpro to yield the mature viral nsp3. 36,44-46 We use the numbering 

j 

of the mature nsp3 herein. Multidimensional NMR experiments were performed to 
assign 96% of the observable resonances of the peptide backbone and amino acid 
sidechains. All backbone 15 N and 1 H x resonances were assigned. The structure 
determination was carried out based on 3D 15 N- and 13 C-resolved ['h/HJ-NOESY 
experiments that were analyzed with the J-UNIO suite of programs. 47 

Table 1 displays the statistics of the structure calculation, indicating a high-quality 
structure determination. A dense network of long-range NOEs was observed and the 
sequential and medium-range NOE pattern was consistent with the secondary structures 
in the protein (Fig. SI). The ensemble of 20 conformers representing the solution 
/structure of the HKU9 C domain (RMSD 0.34 A) is well-defined with the exception of 
the N-terminal expression tag residues Ser -3 and His -2, and the C-terminal residue Lys 
646. 


[Table 1 here] 


Solution structure of the HKU9 C domain 
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fold consisting of six P-strands arranged in an antiparallel P-sheet, together with two a- 
helices at the N- and C-termini that pack on one side of the sheet is observed (Fig. 2). 

AO 

The fold is described as a double-wing motif or frataxin-like fold and is classified as 
similar to the N-terminal domain of CyaY, a bacterial regulatory protein. 49 The helices 
rest in the same plane antiparallel to each other and contribute to one side of the 
hydrophobic core [Fig. 2(A)]. The two helices, al and a2, are comprised of residues 
574-585 and 636-644, respectively. The first beta strands pi (591-592) and P2 
(596-599) follow an extended loop after al and lead to the first P hairpin. The remaining 
beta strands p3—P6 span the residues 602-609, 613-616, 622-626, and 629-632 forming 
a curved P-sheet. The topology of the frataxin fold is shown in Figure 2(C). 

The hydrophobic core is primarily defined by residues from the a-helices and P-strands 
[Fig. 2(B)]. The side chains from Val 575, Phe 578, Val 579, and lie 582 in al and Val 


636, Ala 639, Tyr 642, and Leu 643 in a2 encompass the a-helix contribution to the 
hydrophobic core. The side chains from Cys 597 and Val 599 in P2; Tyr 604, Thr 606, 
lie 607, and Cys 608 in P3; Thr 613, Leu 615, Cys 616, and Phe 617 in P4; and Leu 622, 
Tyr 623, Ala 624, and He 625 in P5 additionally contribute to the hydrophobic core 
^together with Gly 586, Ala 587, Trp 590, Asp 618, Asn 621, and Phe 633 located in loop 


regions. 



Functional analysis and predictions 

Structural alignment of HKU9 C to other proteins using the programs TM-Align 50 and 
Dali 51 revealed structural similarity to betacoronavirus (P-CoV) C domains, frataxins, and 
hypothetical proteins (Table 2A). The most structurally similar proteins originate from 
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other P-CoV C domains, namely those of the human SARS-CoV and murine hepatitis 
virus (MHV) C. The HKU9 C fold is similar to these viral domains, with a similar 
topology and overall backbone RMSD values of 1.7 A and 2.2 A, respectively. These 
viral domains have conserved residues and a highly similar fold despite their low 
sequence identity. Similarity to the frataxins is also evident, with RMSD values of 
approximately 3 A and 1-10% sequence identity. These proteins also show slightly 
different topologies, with longer loops and secondary structure insertions between several 
secondary structure elements [Fig. 3(C)]. 

Functional predictions of HKU9 C were based on an analysis of [3-CoV C domain 
structure-function relationships, together with COACH meta-server results. COACH 
creates a complementary profile and binding site prediction from TM-SITE and S-SITE 
and utilizes multiple structure-based programs (COFACTOR, FINDSITE, and 
Concavity) to derive ligand binding predictions. We used this consensus server 
approach to predict functional characteristics of HKU9-C (Table 2B). 

Based on similarities to the human SARS-CoV C domain, a possible function for HKU9 

T C 

C is nucleic acid binding. To investigate this possibility, we conducted electrophoretic 


mobility shift assays (EMSA) with a panel of RNA and DNA oligonucleotides including 
purine-rich, pyrimidine-rich and G-quadruplex sequences. However, no oligonucleotide 
binding was detected. A second possibility is that HKU9 C functions in concert with the 
neighboring macrodomains, which are binding proteins and enzymes acting on ADP- 
ribose and related metabolites. 53 ' 55 Structural similarity and binding site similarity to 



adenylate-binding proteins is also present. Chemical shift perturbation analysis was 
employed by titrating to 20 times the protein concentration of ADP and ADP-ribose, 
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which are known ligands for macrodomains. 53 No chemical shift changes or line 
broadening in the ['^N/HJ-HSQC spectrum were observed, indicating no interactions or 
complexes formed. 

Functional predictions based on binding site analysis suggested other possible ligands. 


To investigate, chemical shift perturbation experiments were repeated with 
cyanocobalamin (vitamin B12), zinc (II) ions, EDTA, and peptides. Again, no changes in 
the spectrum were observed, suggesting other likely functions for HKU9 C. 




Conservation of the SARS-unique region in betacoronaviruses 

The structure determination of HKU9 C revealed unexpected structural similarity with 
the corresponding SARS-unique domain in the human SARS-CoV. These two sequences 
share only 18% sequence similarity. An area of strong conservation is present around the 
residues Arg 588 - Trp 590 in the loop joining al to [31, where the residues are conserved 
[indicated by stars, Fig. 3(C)] and the protein surfaces have similar polar character [Fig. 
3(A), 3(B)]. Additional similarity is present around the residues Lys 609 - Gly 611. In 
particular, the residues Arg 588-Asp 589 -Trp 590 in HKU9 and the residues Arg 670- 
Asp 671-Trp 672 in SARS adopt nearly identical side chain orientations (Fig. 4). This 
Isuggests a possible conserved function between the two viruses. We describe this surface 
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the conserved face (CF) of the protein. This surface is defined by the loop connecting 
al and (31 and the beta turn between p3 and [34, near the C-terminus of the protein. 

In contrast, the corresponding region of the MHV C domain has acidic and hydrophobic 
character [Fig. 3(A), 3(B)]. This is a consequence of the substitution of the sequence Arg 



588-Asp 589 -Trp 590 by Thr-Asp-Trp and Lys 609 - Arg 610 - Gly 611 by Glu-Cys- 
Pro. Since the three proteins share a low level of overall sequence identity (15-18%), 
this difference would not have been apparent without a structural comparison. 

j 

This structure represents clear evidence that the SARS-unique domain is also conserved 


in b 


in bat CoVs. The overall structural similarity between the HKU9 C domain and the 


SARS C domain, from betacoronavirus lineage B, was assessed by the program DALI. 


51 


The resulting RMSD value was 1 .66 A with a DALI score of 8.2, indicating a strong 


The resi 
match. 


latch. The RMSD value for the MHV C domain from the betacoronavirus lineage A 

^ \ 

was 2.16 A, with a DALI score of 8.7. Since the P-CoV HKU9 belongs to lineage D, this 
analysis reinforces the hypothesis advanced by us and others that the unique region of 
SARS nsp3 is actually conserved across multiple P-CoVs. 30 ’ 32,43,56,57 A structure-based 
sequence alignment of the C domains and related proteins is shown in Figure 3c. 

Residues such as Phe 578, Val 581, and Trp 590 are conserved when compared to the 
sequence and structure of the SARS and MHV orthologues. In contrast, Trp 590 is 
replaced by other aromatic or hydrophobic residues in the frataxins. Based on their low 
solvent accessibility, we conclude that these residues are likely to be important in 
stabilizing the fold, rather than for intermolecular interactions. However, other conserved 
residues that contribute to the surface potential such as the side chains of Arg 588, Asp 
589, Lys 609, Arg 610, and Gly 611, described above, that are oriented to the same face 
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of the protein, are likely to be responsible for a shared function between the CoV groups 



2b and 2d (Fig. 3). Correspondingly, these residues are not conserved throughout the 


protein family. 

The sequence alignment of Figure 3(C) reveals the conserved topology in the frataxin 


• P 


fold family. It also reveals differences between the viral domains and more distantly 
related proteins. The viral proteins retain a similar sequence length, have conserved 
residues in both helices and the P-sheet, and align with high DALI scores of 8.2 and 
above, indicating a strong match. These features are not conserved in the distantly 
related frataxin-like folds. For example, the adenylate-binding AcsD domain (PDB ID: 
2W04) has an extended loop with an alpha turn insertion between pi and P2 and 
another long loop between P3 and P4. DALI scores for the alignment of the HKU9 C 
domain to the human frataxin (PDB: 3T3X) and to the bacterial frataxin (PDB: 4HS5) are 
3.8 and 3.5, respectively. These scores are also significant (>2.0) and indicate a 
conserved fold, but with some structural variability. 51 This is underscored by the 
presence of structural insertions relative to the viral proteins. 

A'Possible functions of the SARS-unique region in BatCoV HKU9 
We employed bioinformatics analysis with the COACH meta-server to predict possible 
functions for the bat CoV HKU9. Several possible functions emerged from this analysis. 
One possible function is as a nucleic acid-binding protein, predicted by the COACH and 
COFACTOR 59 servers (Table 2) with low confidence score values of 0.02 and 0.01. This 
is also highlighted by the sequence and structure alignment of the SARS and HKU9 C 
domains. Several residues involved in the binding of SARS C to RNA are conserved in 
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HKU9-C. 35 The RNA-binding residues from the SARS-CoV protein, such as His 695 
05—(36 loop), Gly 707 (P6-p7 loop) and Val 709 (P7strand), align to Phe 617 (P4—p5 
loop), Gly 627 (p5—P6 loop), and Val 630 (P6) in HKU9 C [Fig. 3(C)]. Additionally, a 
distantly related viral frataxin, the C terminal domain of the T4 activator MotA, (PDB ID: 
1KAF), binds an E. coll DNA promoter sequence. The MotA double-wing p-sheet 
utilizes asparagine residues to bind DNA. These residues are not conserved; for instance, 
one (Asn 187) aligns to Gly 627 in HKU9 C [Fig. 3(C)]. Consistent with this lack of 
conservation, no nucleic acid binding was observed for HKU9 C. However, it is 
possible that this function is present but requires the presence of neighboring nsp3 
domains. 

A second possible function for the HKU9 C domain is protein/protein interaction. In the 
SARS-CoV, the SUD region interacts with host cell proteins to enhance p53 
degradation. 41 The frataxins also have protein binding partners, where the interaction is 
mediated by side chains that are exposed on the planar face of the P-sheet. It is notable 
that in the viral proteins, the P-sheet face is smaller and less planar than that of the 
frataxins. The latter proteins have the pi _ P2 hairpin in the same plane as the P-sheet 
[Fig. 3(A)]. However, in the P-CoV domains, the p 1—p2 hairpin wraps over the p-sheet, 
obscuring the side chains in p2—p6 from the protein surface. In addition, the P-sheet side 
chains that are important to frataxin binding and catalysis such as Trp 155 and Arg 165 
(human) or Arg 53 and Trp 61 (Psychromonas ingrahamii ) are not conserved in the P- 
CoV proteins. 61 ’ 62 

Protein or peptide binding is another function that is predicted by the bioinformatic 
analysis of HKU9 C (Table 2). A potential protein-binding site is predicted to be present 
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on the conserved face (CF) of the protein. The site shows structural and chemical 
similarity to that of the AAA + delivery protein 63 and the Nsll protein. 64 The surface 
identified by this prediction includes the residues Arg 588-Asp 589-Trp 590 and Lys 609 
- Arg 610 - Gly 611 that we have identified as a conserved functional site. 

Analysis of the bioinformatics results displays a theme with respect to HKU9 C surface 
regions. The conserved face of the fold [Fig. 3(B)] is the only region of the protein that 
was predicted to have protein-protein interactions, while the other surface regions 
predicted metal ion and small molecule recognition [Table 2(B)]. The |3-sheet sidechains 
are not solvent exposed, but side chains from the [32-p3 and p4—p5 loops and from the P6 
strand could potentially bind small molecules. Interestingly, the metal ion ligands such 
as Ca and Zn were predicted to bind to the a-helices. To date, we were not able to 
experimentally confirm any metal ion binding activity or nucleic acid binding activity for 
HKU9 C. The prediction of a possible protein/protein interaction function is intriguing 
and awaits further experiment. 

We hypothesize that the conserved face of the HKU9 C domain is a likely interface for 
HKU9 C binding partners. Based on our FFAS (Fold and Function Assignment System) 
analysis, 65 and on the experimental results reported here, we predict structural 
conservation between the nsp3 proteins of the human SARS-CoV and bat HKU9. We 
used this structural alignment to predict the linker regions that would join the FIKU9 C 
domain to the neighboring domains in nsp3. At the N-terminus of HKU9 C, a short, 
three-residue linker is predicted to join the domain to the neighboring M domain; while at 
the C-terminus, on the “conserved face” of the protein, a seven-residue linker joins the C 
domain to the papain-like protease of the virus. 36,44 ’ 66 A longer linker would provide 
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flexibility to accommodate binding partners and interactions. This would coincide with 
our hypothesis that the conserved face of the protein near the C-terminus may harbor a 
potential functional site for reactivity or binding to other biomolecules. 

Conclusions 


The frataxin or double-wing fold of the bat HKU9 nsp3 C domain reported here has high 
structural similarity to the human SARS-CoV C domain. Although there is low sequence 


similarity to the other CoV nsp3 proteins, some residues are structurally conserved. The 




conservation of specific surface polar residues relative to the human SARS virus may 
indicate a conserved function among certain betacoronaviruses. 



Materials and Methods 



Protein expression and purification 

The DNA sequence encoding the central region of nsp3 (37-1037) was obtained as a 
codon-optimized synthetic gene from Genscript (Piscataway, NJ). The residues 573-646 
gpf nsp3, corresponding to the recombinant HKU9 C domain, were cloned into the vector 
pET-15b-TEV 67 vector from the Northeast Structural Genomics Consortium (DNASU). 
The construct was expressed in E. coli strain BL21 (DE3) with a 6xHis tag. The Ser-His- 
Met sequence at the N-terminus of the proteins remained after tag cleavage with the 
tobacco etch virus protease. The sample was prepared in both LB medium for natural 


15 13 

isotopic abundance and in minimal medium for unifonn N- and C-labeling. These 

| . 

samples were used for functional analysis and structure determination, respectively. 
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Sample 


Sample conditions such as buffer, pH, and salt concentration were optimized based on 


peak intensity and linewidth in the [^N^HJ-HSQC spectrum, leading to the selection of 
20 mM sodium phosphate (pH 6.0), 150 mM NaCl, and 5 mM DTT. The protein was 
monomeric as assessed by size-exclusion chromatography on a GE Healthcare HiLoad 
26/600 Superdex™ 200 pg column. 


NMR spectroscopy 

The C domain structure was determined based on multidimensional NMR experiments 


using uniformly 15 N- or [ 15 N, 13 C]-labeled protein solution with 97% H20/3% D 2 O (v/v). 
All experiments were conducted on Bruker Avance III HD spectrometers (600 and 850 
MHz) equipped with Bruker 5 mm TCI cryoprobes and on a Bruker Avance II 700 MHz 
spectrometer equipped with a CP TCI H-C/N-D cryoprobe. The sequence-specific 


backbo 


ackbone assignments were based on 3D HNCACB, CBCA(CO)NH, HNCA, HNHA, 



and HNCO experiments. Aliphatic and aromatic side chain assignments were determined 
using the ASCAN 68 protocol in the J-UNIO 47 suite of programs followed by interactive 
correction and completion using 3D CC(CO)NH, HBHA(CO)NH, (HB)CB(CGCD)HD- 


COSY, (HB)CB(CGCDCE)HE-COSY, 3D HC(C)H-TOCSY, 15 N-resolved [‘h^H]- 
NOESY (i m =150 ms), 13 C-resolved aliphatic ['H^HJ-NOESY (i m =150 ms), and 1 j C- 
resolved aromatic [' H, 1 Hj-NOESY (x m =150 ms) experiments. 69 All assignments were 
/verified manually using the CARA and CCPNmr Analysis programs. 70,71 'H chemical 
shifts were calibrated from internal 3-(trimethylsilyl)propane-l -sulfonic acid (DSS) and 
N and C shifts were referenced indirectly. 
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The ATNOS and CANDID 73 ’ 74 algorithms in the J-UNIO suite were used to pick the 
OESY spectra and to calculate the structure of the C domain. A globular fold obtained 
after the first cycle remained consistent throughout the calculation with a steady decrease 
in RMSD of the ensemble. A set of 132 tight dihedral angle restraints were obtained 
from the program Talos+. 75 A set of loose (j), vp, and %' restraints produced by the 
HABAS algorithm in CYANA 2.0 76 based on intraresidual and sequential NOEs 
provided an additional 347 dihedral angle restraints for the structure calculation. 73,74 The 
set of unambiguous NOE assignments obtained in the final cycle of calculation included 
1828 restraints or 24 restraints/residue (Table 1). The 20 structures with the lowest 
CYANA target function values in cycle 7 were further refined by explicit solvent 
minimization using the AMBER03 force field in explicit solvent (TIP3PBOX) with a 10 
A box geometry using the web-based AMBER interface AMPS-NMR in the WeNMR 
portal. 77 

Structure validation of the final ensemble employed the Protein Data Bank validation 
suite, MolMol 2K.2, and ProCheck 3.5.4 from the PSVS suite 1.5. 78-81 Validation also 
employed input and output of the Unio calculation and agreement of the structure with 
NMR observables (Table S2). 

The atomic coordinates of the ensemble of conformers of Figure 2(B) have been 
deposited in the Protein Data Bank with accession number 5UTV. The sequence-specific 
resonance assignments have been deposited in the BioMagResBank with accession 
number 30247. 


Chemical shift perturbation experiments with BatCoV C domain 
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NMR ti 


NMR titrations were conducted by diluting the protein sample with NMR buffer 


consisting of 20 inM sodium phosphate (pH 6.0), 150 mM NaCl, 3% D?0 (v/v), 5 mM 
DTT-dio, and 0.02% NaN 3 , to a final concentration of 50 pM. The potential ligands were 
dissolved in NMR buffer at a 20 mM final concentration. Titrations were conducted by 
measuring NMR [^Nj’HJ-HSQC spectra at increasing ligand:protein concentration ratios. 
An initial measurement without ligand present and with 2048 ( 1 H) x 256 ( 15 N) points was 
gtaken as a baseline. The ratios of ligand to protein concentration were 0.25:1, 0.5:1, 

1.0:1, 5.0:1, 10.0:1, 20.0:1 for cyanocobalamin, ZnCfi, ADP, ADP-ribose, and 10:1 and 
20:1 forEDTA. 



Electrophoretic mobility shift assays 

Binding assays were conducted by incubating purified HKU9 C protein with a set of 


pm x 

DNA and RNA oligonucleotides available in our laboratory for studying protein-nucleic 
acid interactions. Protein-oligonucleotide mixtures were incubated at 25°C for 1 h in 
EMSA buffer: 20 mM sodium phosphate (pH 6.0), 75 mM NaCl, and 3% glycerol. G- 
quadruplex oligomers were annealed by heating to 95°C for 5 min and slowly cooling to 


18°C overnight in buffer: 20 mM sodium phosphate (pH 6.0), 75 mM NaCl. The 


18 C o 1 


mixtures were resolved by native electrophoresis on 10% TBE gels (Invitrogen) for 1 h at 


4°C. Gels were stained with SYBR Gold stain (Invitrogen) and visualized by the Safe 
Imager 2.0 Blue-Light Transilluminator (Invitrogen). 
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Supplementary Table 1. List of oligonucleotide sequences used in the electrophoretic 
mobility shift assays. Filename: Supplementary!able-.docx 

Supplementary Table 2. Structure validation procedures and results. Filename: 
Supplementary!able2.docx 
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Tables 

Table 1. Input for the structure calculation of HKU9 C and the statistics of the 20 
energy-minimized conformers used to represent the solution structure. 

Table 2. Bioinformatics results from HKU9 C structure-based alignment and protein 
function prediction. 



Figure Legends 

Fig. 1. 2D [’^N^HJ-HSQC spectrum of the HKU9 C domain. 1.2 mM 15 N-labeled 
HKU9 C in a 20 mM sodium phosphate (pH = 6.0), 150 mM NaCl, 3% dio-DTT, 0.02% 
(w/v) NaN 3 solution was measured on a Bruker Avance III 600MHz spectrometer. 
Backbone 15 N-*H correlation peaks are indicated by single letter amino acid 
nomenclature. Arg and Trp assigned 15 N-’Hs correlation peaks are labeled. The amide 
side chain 15 N-’H 2 signals from Asn and Gin are shown with horizontal lines. 

Fig. 2. NMR solution structure of the HKU9 C domain. Wall-eye stereo views are 
shown. A) Ribbon representation of the representative conformer (nearest to the mean 
coordinates of the ensemble). Secondary structures are labeled. B) Line representation 
of the 20-conformer ensemble. The polypeptide backbone (blue) and selected side chains 
with solvent accessibility below 15% (red) are shown. C) Topology diagram of HKU9- 
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C. The a-helices (red) are indicated by rectangles and the P-sheets (yellow) are indicated 
by arrows. 


Fig. 3. A) Ribbon representations of nonstructural protein 3 C domains. Conserved 
residues or conservative substitutions relative to FIKU9 are highlighted in green 
(aliphatic), blue (basic), and red (acidic). Residue numbers are indicated with respect to 
the first residue in each protein. Left to right: FIKU9 C, SARS SUD-C (PDB ID: 
2KAF), 35 MHV C domain (PDB ID: 4YPT), 30 human frataxin (PDB ID: 3T3X). 62 B) C 
domain electrostatic potential surfaces. Red areas represent positively charged regions, 


blue areas represent negatively charged areas, and white areas represent neutral areas. C) 

A 

Structure-based sequence alignment of the C domain in the Rousettus bat coronavirus 
HKU9, related viral proteins, and with other proteins in the frataxin fold family: phage 


T4 MotA (PDB ID: 1KAF), 60 hypothetical protein (PDB ID: 1YB3), Psychromonas 
ingrahamii FTXN (PDB ID: 4HS5), 62 Ataxia FTXN (PDB ID: 3T3X), 61 AcsD (PDB ID: 
2W04). 58 The alignment is based on structural alignments obtained with TM-Align. 50 
PDB codes are included after each protein name. The residue numbers for HKU9-C are 
indicated. Alpha helix regions are displayed in red (cylinders) and beta strands are 
shown in blue (arrows). Gaps are shown as dashes (-) and insertions where additional 
secondary structures are present are indicated by forward slash marks (//). Residues 
indicated by stars (*) discussed in the text are involved in potential functional sites. The 
corresponding Dali scores for the pairwise alignment of each protein with HKU9 C and 
the percent amino acid identity between each protein and HKU9 C domain are listed. 
Dali scores of 2.0 and higher indicate significant sequence identity. 51 


John Wiley & Sons 

This article is protected by copyright. All rights reserved. 


34 


Page 35 of 48 


Protein Science 


Fig. 4. Overlay of HKU9-C (green) and SARS-C (cyan) backbone with secondary 
structures shown. Sidechains from Arg 588-Asp 589-Trp 590 and Lys 609-Arg 610-Gly 
611 (HKU9 C) and Arg 670-Asp 671-Trp 672 and Lys 687-Arg 688-Gly 689 (SARS C) 
are shown with the corresponding one-letter amino acid code. 


Supplementary Figure 1. a) Long-range NOEs ( |i-j| > 5 ) observed in the FIKU9 C 
domain are plotted as blue lines. The amino acid sequence is shown in one-letter code. 
Secondary structures are indicated above the sequence, b) NOE restraints for the HKU9 
C structure determination versus the HKU9 C domain sequence. Color code: 
Intraresidue, white; sequential, light gray; medium-range, dark gray; long-range, black, 
c) Sequential and medium-rage NOEs versus the HKU9 C domain sequence. The 
intensity of the NOE (strong, medium or weak) is indicated by the height of the bar. 
Secondary structures are shown above the sequence. 
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Table 1. Input for the structure calculation of HKU9 C and the statistics of the 20 energy- 
minimized confonners used to represent the solution structure. 


Quantity 

Value 

NOE upper distance limits 

1828 

Intraresidue (i-j =0) 

311 

Sequential ( i-j =1) 

528 

medium-range (1< i-j <5) 

329 

long-range (|i-j|>5) 

660 

Dihedral angle constraints 


Talos + 

132 

HABAS (CYANA) 

347 

NOEs per residue 

23.74 

Long-Range NOEs per residue 

8.57 

CYANA minimized target function 

1.70 ±0.40 

Residual NOE Violations 


Number > 0.2 A 

6 

RMS violation 

0.0214 

Residual dihedral angle violations 


Number > 5.0° 

1 

RMS violation 

0.3553 

RMSD from Ideal Geometry 3 


Bond Lengths, A 

0.016 

Bond Angles, ° 

2.8 


John Wiley & Sons 

This article is protected by copyright. All rights reserved. 







Accepter 


Page 37 of 48 


Protein Science 


RMSD to the mean coordinates, A a 
Backbone (574-645) b 0.34 ±0.11 

Heavy Atom (574-645) b 0.72 ± 0.08 

Ramachandran plot statistics 3 


Most favored regions (%) 

90.2 

Allowed regions (%) 

6.1 

Disallowed regions (%) 

3.7 

a As determined by MOLPROBITY [79]. 

Calculated 

using PSVS version 1.5 [54] 



b Residue range used to calculate the backbone and 

A 

heavy-atom RMSDs 
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Table 2. Bioinformatics results from HKU9 C structure-based alignment and protein function prediction. 





Identifier 

Protein Name 

RMSD 

%ID 





PDB ID: 2KAF 

SARS nsp3 C domain 

1.66 

18 


TM-Align a 

PDB ID: 4YPT 

MHV nsp3 domain 

2.21 

15 


• pm 


PDB ID: 1KAF 

Phage T4 MotA 

2.84 

7 




PDB ID: 1YB3 

P.furiosus 178653-001 

3.12 

1 


• 







A 



PDB ID: 2KAF 

SARS nsp3 C domain 

1.8 

19 




PDB ID: 4YPT 

MHV nsp3 domain 

2.3 

16 




PDB ID: 1KAF 

Phage T4 MotA 

2.7 

9 


DALI b 









PDB ID: 4HS5 

P. ingrahamii FTXN 

2.8 

9 





PDB ID: 3T3X 

Friedreich’s Ataxia FTXN 

2.9 

6 




PDB ID: 1YB3 

P.furiosus 178653-001 

3.3 

1 







Alignment 


1 


Server Prediction 










Scores" 






Structurally Similar Proteins 







Binding 






Jr ^ 

Ligand 



C 

BS 





Site c 





Q 




Ring 1 B/Bmi 1 /UbcH5 c PRC1 



B 



DNA 

P 


0.01 

0.86 






(4R8P) 






Peptide 

a/p 

Glycyl-tRNA Synthetase (1ATI) 

0.01 

0.77 


COFACTOR 









Cyanocobalamin 









CF 

Glycerol Dehydratase (1MMF) 

0.01 

0.20 




(Vitamin B12) 








ATP 

P 

Human Glycyl-tRNA synthetase 

0.01 

0.15 
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(2ZT7) 









C 










Calcium ions 

a 

E. coli ROM variant (1F4M) 

0.12 



Zinc ions 

a 

Acyl Carrier Protein (2QNW) 

0.11 



“GAANDENY” 

CF 

AAA+ delivery protein (10U8) 

0.09 






COACH 

“QRKWYPLRP” 

CF 

Knll/Nsll complex (4NF9) 

0.05 




Peptide 

a/p 

Glycyl-tRNA Synthetase (1ATI) 

0.04 




DNA 

P 

Ring 1 B/Bmi 1 /UbcH5 c PRC1 

0.02 






(4R8P) 





Serotonin 

a 

AM 182 Serotonin Complex 

0.02 






(3BRN) 








TM-Site 

Zinc(II) ions 

Apcin 

a 

P 

Ferric Enterobactin (2CHU) 

Cdc20 (4N14) 

0.37 

0.18 



“GAANDENY” 

CF 

AAA+ delivery protein (10U8) 

0.27 







N-acetyl- 

a 

L-Ficolin protein (2JOG) 

0.13 


FINDSITE 

mannosamine 





Serotonin 

a 

AM 182 Serotonin Complex 

(3BRN) 

0.06 


Vl-Align[25] scales structure similarity to protein templates from residue-specific alignment. 
b DALI[26] server uses structure-based templates to provide matches in structure and function. 
c Predicted binding regions of the HKU9 C protein are described: residues from the a-helices (a), residues 
from the solvent-accessible p-sheet (P), and the conserved polar face containing the Arg-Asp-Trp and Lys- 
Arg-Gly motif (CF). 

d The confidence score (C-score) is used to evaluate the reliability of the prediction. The binding site score 
(BS-score) evaluates how significant is the match between the predicted binding site and the template 
binding site. Alignment score values range from 0 to 1, with higher values having greater significance. 
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Fig. 1. 2D [ 15 N, 1 H]-HSQC spectrum of the HKU9 C domain. 1.2 mM ls N-labeled HKU9 C in a 20 mM sodium 
phosphate (pH = 6.0), 150 mM NaCI, 3% dio-DTT, 0.02% (w/v) NaN 3 solution was measured on a Bruker 
Avance III 600MHz spectrometer. Backbone 15 N- 1 H correlation peaks are indicated by single letter amino 
acid nomenclature. Arg and Trp assigned 15 N- 1 He correlation peaks are labeled. The amide side chain 
15 N- 1 H 2 signals from Asn and Gin are shown with horizontal lines. 
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Fig. 2, NMR solution structure of the HKU9 C domain. Wall-eye stereo views are shown. A) Ribbon 
representation of the representative conformer (nearest to the mean coordinates of the 
ensemble). Secondary structures are labeled. B) Line representation of the 20-conformer ensemble. The 
polypeptide backbone (blue) and selected side chains with solvent accessibility below 15% (red) are 
shown. C) Topology diagram of HKU9-C. The a-helices (red) are indicated by rectangles and the p-sheets 

(yellow) are indicated by arrows. 
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( 


HKU9-C SARS-C 
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C) 


PI p2 p3 P4 

,<=> l=> >=> 

*590 600 610 * 

HKU9 nsp3 domain C (5UTV) - ;AVQDFWDILLK - GARDWDVLQTTCTVDBKVYKTICKRGN-TYLCFDD - 

SARS-unique domain C (2KAF)- C-SEEHFVETVSLA — GSYFCWSYSGQRTELGVE-FLKRGD-KIVYHTLES- 

MHV nsp3 domain < 4YPT) - IQLDDDARVTVQAhT::- CLPTDWRLVNKFDSVDGVRTIKYFECPG-GIFVSSQ - 

Phage T4 MotA (1KAF) -ME! TSDMEEDKDLMUO.^— DKNGFVLKKVEIYR-S— IIYLAILEKRTN-GIRNFEINJl 

P. furiosus 178653 (1YB3) LKEVHE LLNRUfGDI FELREEI1C— EELKGFTVE// GKWEEKYPHPAFAVK PG-- GEVGATPQG 

P. lngrahamll FTXN (4HS5) MN DS E r IQLADQLYQKIKE KIEESGA DVDYDQ - NGSLLTLEFENHTKLIINRQQP 

Ataxia FTXN (3T3X) - SLDETTYERLAEETLDSIAEFFE/ /DYDVSFGS - GVLTVKLGGDLGTYVINKQT 

ACSD (2W04) HDVLSRMI SEKAALHGLLNCLIKEr AIPEGYLRYEWPDSEW//I PMMIGLPDQL/ /HYLSDV— 
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7 

3.5 

7 

3.2 
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Fig. 3. A) Ribbon representations of nonstructural protein 3 C domains. Conserved residues or conservative 
substitutions relative to HKU9 are highlighted in green (aliphatic), blue (basic), and red (acidic). Residue 
numbers are indicated with respect to the first residue in each protein. Left to right: HKU9 C, SARS SUD-C 
(PDB ID: 2KAF)[35], MHV C domain (PDB ID: 4YPT)[30], human frataxin (PDB ID: 3T3X)[62], B) C domain 
electrostatic potential surfaces. Red areas represent positively charged regions, blue areas represent 
negatively charged areas, and white areas represent neutral areas. C) Structure-based sequence alignment 
of the C domain in the Rousettus bat coronavirus HKU9, related viral proteins, and with other proteins in the 
frataxin fold family: phage T4 MotA (PDB ID: 1KAF)[60], hypothetical protein (PDB ID: 1YB3), 
Psychromonas ingrahamii FTXN (PDB ID: 4HS5)[62], Ataxia FTXN (PDB ID: 3T3X)[61], AcsD (PDB ID: 
2W04)[58]. The alignment is based on structural alignments obtained with TM-Align [50], PDB codes are 
included after each protein name. The residue numbers for HKU9-C are indicated. Alpha helix regions are 
displayed in red (cylinders) and beta strands are shown in blue (arrows). Gaps are shown as dashes (-) and 
insertions where additional secondary structures are present are indicated by forward slash marks 
(//). Residues indicated by stars (*) discussed in the text are involved in potential functional sites. The 
corresponding Dali scores for the pairwise alignment of each protein with HKU9 C and the percent amino 
acid identity between each protein and HKU9 C domain are listed. Dali scores of 2.0 and higher indicate 

significant sequence identity. 
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Fig. 4. Overlay of HKU9-C (green) and SARS-C (cyan) backbone with secondary structures 
shown. Sidechains from Arg 588-Asp 589-Trp 590 and Lys 609-Arg 610-Gly 611 (HKU9 C) and Arg 670-Asp 
671-Trp 672 and Lys 687-Arg 688-Gly 689 (SARS C) are shown with the corresponding one-letter amino 

acid code. 
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